NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multivariate dynamic mediation analysis under a reinforcement learning framework

https://doi.org/10.1214/24-AOS2475

Luo, Lan; Shi, Chengchun; Wang, Jitao; Wu, Zhenke; Li, Lexin (February 2025, The Annals of Statistics)

Free, publicly-accessible full text available February 1, 2026
Statistics and AI: A Fireside Conversation

https://doi.org/10.1162/99608f92.c066fe9c

Lin, Xihong; Cai, Tianxi; Donoho, David; Fu, Haoda; Ke, Tracy; Jin, Jiashun; Meng, Xiao-Li; Qu, Annie; Shi, Chengchun; Song, Peter; et al (January 2025, Harvard data science review)

Full Text Available
Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

https://doi.org/10.1080/01621459.2022.2106868

Shi, Chengchun; Luo, Shikai; Le, Yuan; Zhu, Hongtu; Song, Rui (January 2024, Journal of the American Statistical Association)

Full Text Available
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process

https://doi.org/10.1080/01621459.2022.2110878

Shi, Chengchun; Zhu, Jin; Shen, Ye; Luo, Shikai; Zhu, Hongtu; Song, Rui (January 2024, Journal of the American Statistical Association)

Full Text Available
Testing Directed Acyclic Graph via Structural, Supervised and Generative Adversarial Learning

https://doi.org/10.1080/01621459.2023.2220169

Shi, Chengchun; Zhou, Yunzhe; Li, Lexin (July 2023, Journal of the American Statistical Association)

Full Text Available
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen (December 2023, Neural Information Processing Systems (NeurIPS 2023))

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential im- portance sampling estimators suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introduc- ing future-dependent value functions that take future proxies as inputs and perform a similar role to that of classical value functions in fully-observable MDPs. We derive a new off-policy Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is close to the true policy value under Bellman completeness, as long as futures and histories contain sufficient information about latent states.
more » « less
Full Text Available
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen (December 2023, 37th Conference on Neural Information Processing Systems (NeurIPS 2023))

Full Text Available
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Uehara, Masatoshi; Kiyohara, Haruka; Bennett, Andrew; Chernozhukov, Victor; Jiang, Nan; Kallus, Nathan; Shi, Chengchun; Sun, Wen (December 2023, Advances in neural information processing systems)

Full Text Available
Testing for the Markov property in time series via deep conditional generative learning

https://doi.org/10.1093/jrsssb/qkad064

Zhou, Yunzhe; Shi, Chengchun; Li, Lexin; Yao, Qiwei (June 2023, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract The Markov property is widely imposed in analysis of time series data. Correspondingly, testing the Markov property, and relatedly, inferring the order of a Markov model, are of paramount importance. In this article, we propose a nonparametric test for the Markov property in high-dimensional time series via deep conditional generative learning. We also apply the test sequentially to determine the order of the Markov model. We show that the test controls the type-I error asymptotically, and has the power approaching one. Our proposal makes novel contributions in several ways. We utilise and extend state-of-the-art deep generative learning to estimate the conditional density functions, and establish a sharp upper bound on the approximation error of the estimators. We derive a doubly robust test statistic, which employs a nonparametric estimation but achieves a parametric convergence rate. We further adopt sample splitting and cross-fitting to minimise the conditions required to ensure the consistency of the test. We demonstrate the efficacy of the test through both simulations and the three data applications.
more » « less
A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Shi, Chengchun; Uehara, Masatoshi; Huang, Jiawei; Jiang, Nan (July 2022, Proceedings of the 39th International Conference on Machine Learning)

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy’s value and the observed data distribution. We next propose minimax estimation methods for learning these bridge functions, and construct three estimators based on these estimated bridge functions, corresponding to a value function-based estimator, a marginalized importance sampling estimator, and a doubly-robust estimator. Our proposal permits general function approximation and is thus applicable to settings with continuous or large observation/state spaces. The nonasymptotic and asymptotic properties of the proposed estimators are investigated in detail. A Python implementation of our proposal is available at https://github.com/jiaweihhuang/ Confounded-POMDP-Exp.
more » « less
Full Text Available

« Prev Next »

Search for: All records